Unsupervised learning of edit parameters for matching name variants

نویسندگان

  • Daniel Gillick
  • Dilek Z. Hakkani-Tür
  • Michael Levit
چکیده

Since named entities are often written in different ways, question answering (QA) and other language processing tasks stand to benefit from entity matching. We address the problem of finding equivalent person names in unstructured text. Our approach is a generalization of spelling correction: We compare to candidate matches by applying a set of edits to an input name. We introduce a novel unsupervised method for learning spelling edit probabilities which improves overall F-Measure on our own name-matching task by 12%. Relevance is demonstrated by application to the GALE Distillation task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Using Long-Term Structure to Retrieve Music: Representation and Matching

We present a measure of the similarity of the long-term structure of musical pieces. The system deals with raw polyphonic data. Through unsupervised learning, we generate an abstract representation of music the “texture score”. This “texture score” can be matched to other similar scores using a generalized edit distance, in order to assess structural similarity. We notably apply this algorithm ...

متن کامل

Unsupervised Learning of Edit Distance Weights for Retrieving Historical Spelling Variations

While todays orthography is very strict and seldom changes, this has not always been true. In historical texts spelling of words often not only varies from todays but in some periods even varies from use to use in a single text. Information retrieval on historical corpora can deal with these variations using fuzzy matching techniques based on Levenshtein-Distance using stochastic weights. In pa...

متن کامل

Schema Matching using Machine Learning

Schema Matching is a method of finding attributes that are either similar to each other linguistically or represent the same information. In this project, we take a hybrid approach at solving this problem by making use of both the provided data and the schema name to perform one to one schema matching and introduce creation of a global dictionary to achieve one to many schema matching. We exper...

متن کامل

A Double Metaphone Encoding for Approximate Name Searching and Matching in Bangla

Almost any word can be a Bangali name, and the name in turn is often spelled in many different ways, all of which are considered correct and interchangeable. The reason for the spelling complication is two-fold: (1) there is a large gap between the script and pronunciation in Bangla, largely attributed to the large scale Sanskritization process that started in the 12 century and continued throu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008